REMARKS 

Applicant has amended claim 11 to recite dependency upon claim 10. The examiner's 
other claim rejections were 103 rejections based upon the combination of Lakshmi in view of 
Kakazu. The applicant responds as follows: 
The Teachings of Lakshmi: 

Lakshmi is directed to methods for searching a database. Database searches can be 
lengthy procedures requiring extensive i/o operations, memory and processor capacity. Lakshmi 
is directed to a technique to construct an "optimal" database search strategy, optimal in the sense 
of reducing costs, system resources or other desired parameters. Lakshmi uses a neural network 
in his technique, as follows: (1) a user search query (such as a SQL query) is received by the 
"optimizer"; (2) the optimizer extracts standard features of the search query to form a feature 
vector; (3) the feature vector is input into a neural network (NN), where the NN is trained to 
output or predict cost values (like expected I/O calls, selectively values for the data types, cost 
per call (processor resources, etc), and (4) the "cost values" are used by the optimizer to 
construct an optimized search strategy (for instance, to search using sequential table scans, B- 
tree scan indexes, etc). The optimized search strategy is provided to the database management 
system search engine, which then undertakes the search of the database and outputs the results of 
the search. See generally, Col. 5, lines 22-36. The database may be located on one computer or 
spread across several computers. 
The Application of Lakshmi to the Invention: 

The examiner has indicated that Lakshmi teaches a system for receiving a data request, 
and assigning one computer from a plurality to service the data request. The examiner indicates 
that this is taught in the abstract, or in Col. 33, lines 66-67, or Col. 12 lines 53-63. This teaching 
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is not present in Lakshmi nor suggested in Lakshmi. Indeed, Lakshmi does not contain such a 
teaching as Lakshmi is directed to techniques for optimizing database searches, that is, 
techniques for searching through a database. Lakshmi does not teach or suggest any means of 
choosing a particular computer from a plurality of computers to respond to a data request. 

Applicant's claims are not directed to "finding" data responsive to a query through a 
search, as does Lakshmi. Applicant's invention is directed to choosing one computer from a 
plurality to respond to the data request. Applicant's invention is directed to subject matter 
completely different than Lakshmi. Applicant does not search for data responsive to the request; 
instead, Applicant searches for a computer to receive the request for response, where each 
computer is capable of responding to the request. 

The examiner also indicates that Lakshmi teaches selecting a computer assignment 
associated with an output node of the neural network. Lakshmi fails to so teach. Again, 
Lakshmi is not directed to choosing computers, but of construction optimal search strategies. 
The Teachings of Kakazu: 

Kakazu teaches methods to check the input/output characteristics of a neural network. 
Kakazu teaches checking the input/output characteristics as follows: choosing one input node, 
and inputting a predetermined range of variable data to the selected node while keeping constant 
inputs data to the remaining nodes, and examining the resultant output data from the output 
nodes. See Col. 2., lines 21-30. 
The Application of Kakazu to the Invention: 

The examiner has indicated that Kakazu teaches associating each output node of the 
neural network with a computer from a plurality of computers. The examiner indicates this 
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teaching is in the abstract. Such a teaching is not in the abstract, nor anywhere else in Kakazu. 
Indeed, Kakazu does not mention multiple computers in the patent. 

The examiner also indicates that Kakazu teaches inputting into the input layer of the 
neural network a vector R, where the entries of this vector are dependent upon the number of 
requests of the requested data set over a predetermined of time. The examiner cites Col. 2, lines 
27-30, and Col. 4, lines 62-67 for this proposition. However, a reading of theses sections 
indicates that an input vector is used, and one of the components of this input vector can be 
varied to examine the resulting the changes in the output vector. Nowhere does Kakazu teach or 
suggest associating the input vector components with the number of prior requests for a 
particular data set over a predetermined period of time. Indeed, Kakazu would not suggest such, 
as Kakazu does not deal with multiple computers or multiple data sets. 
The Combination of Lakshmi and Kakazu: 

The examiner has used the combination of Lakshmi and Kakazu on all of applicant's 
claims, and in particular, applicant's independent claims. The applicant respectfully submits that 
the examiner has not established a prima facie case of obviousness as required under MPEP 
§706.02(j) for these independent claims, and hence, all the dependent claims. That section 
provides in part: 

After indicating that the rejection is under 35 U.S.C. 103, the examiner should set 
forth in the Office action: 

(A) the relevant teachings of the prior art relied upon, preferably with reference to 
the relevant column or page number(s) and line number(s) where appropriate, 

(B) the difference or differences in the claim over the applied reference(s), 

(C) the proposed modification of the applied reference(s) necessary to arrive at 
the claimed subject matter, and 

(D) an explanation why one of ordinary skill in the art at the time the invention 
was made would have been motivated to make the proposed modification. 
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To establish a prima facie case of obviousness, three basic criteria must be met. First, 
there must be some suggestion or motivation, either in the references themselves or in the 
knowledge generally available to one of ordinary skill in the art, to modify the reference or to 
combine reference teachings. Second, there must be a reasonable expectation of success. Finally, 
the prior art reference (or references when combined) must teach or suggest all the claim 
limitations. The teaching or suggestion to make the claimed combination and the reasonable 
expectation of success must both be found in the prior art and not based on applicant's disclosure. 
In re Vaeck, 947 F.2d 488, 20 USPQ2d 1438 (Fed. Cir. 1991). 

The Examiner's rejections under 35 U.S.C. §103 consist of a recitation of the elements 
of the claims which are alleged to be found in two separate references followed by a conclusion 
that it would be obvious to combine the elements from the two separate references. However, 
the cited referenced fail to disclose or teach those elements of the Applicant's claims, as 
suggested by the examiner. The examiner has cited Lakshmi and Kakazu as reciting elements 
which are simply not present in these references. The examiner has failed to make a prima facie 
case of obviousness. 

Because the examiner has failed to make a prime facie case with respect to the 
independent claims, the examiner's rejections of the remaining dependent claims do not need to 
be addressed. 

CONCLUSIONS 

It is believed that the application is now in a condition for allowance. It is therefore 
respectfully requested that the Examiner reconsider the rejections made in light of the 
amendments and remarks presented herein, and that the remaining pending claims be allowed. 
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The undersigned asks that the Examiner contact him at (225) 248-2104 if he has any questions so 



that early allowance might be reached. 



DATE: 



Respectfully submitted, 



Bernard F. Meroney>Reg. No. 37,188 
Attorney for Applicant 
Jones, Walker, Waechter, Poiteverit, Carrere 
& Denegre, L.L.P. 
8555 United Plaza Blvd., 4th Floor 
Baton Rouge, Louisiana 70809 
Telephone: (225)248-2104 
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RECEIVED 

OCT 0 1 2004 

UP REPLACEMENT SPECIFICATION _ , „ 4 MM 

Technology Center 21 00 

(a) TITLE OF IN VENTION: Method of Allocation of Web Pages Using Neural Networks 

(b) CROSS-REFERENCE TO RELATED APPLICATIONS: Not applicable. 

(c) STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH: Not applicable. 

(d) INCORPORATION-BY-REFERENCE OF MATERIAL ON CD: Not applicable 
fe) BACKGROUND OF THE INVENTION: 
Field of Invention 

(1) Field of Invention: The invention relates to methods of allocating page requests to servers on 
a web farm and, more particularly, to using a neural network to allocate page requests to web 

10 farm servers. 

(2) Description of Related Art 
Background of the Inv e ntion 

The World- Wide- Web offers tremendous opportunities for marketers to reach a vast 
variety of audiences at less cost than any other medium. Recent studies have shown that the web 

15 consumes more Internet bandwidth than any other application. With huge amount of capital 
invested in these sites, it has become necessary to understand the effectiveness and realize the 
potential opportunities offered by these services. 

The number of Web sites on the Internet has grown from an estimated 11,000 sites in 
1994 to over 4 million in 2000. The traffic load on the web site is normally measured in terms of 

20 the number of http requests handled by the web site. Web sites with heavy traffic loads must use 
multiple servers running on different hardware; consequently this structure facilitates the sharing 
of information among servers through a shared file system or via a shared data space. Examples 
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of such a system include Andrews file system (AFS) and distributed file system (DFS). If 
this facility is not there, then each server may have its own independent file system. 
There are four basic approaches to route requests among the distributed Web-server nodes: (1) 
client-based, (2) DNS-based, (3) dispatcher-based, and (4) server-based. In the client-based 
5 approach, requests can be routed to any Web server architecture even if the nodes are loosely 
connected or are not coordinated. The routing decisions can be embedded by the Web clients 
like browsers or by the client-side proxy servers. For example, Netscape spreads the load among 
various servers by selecting a random number / between 1 and the number of servers and directs 
the requests to the server www/.netscape.com. This approach is not widely applicable as it is not 

10 easily scalable and many Web sites do not have browsers to distribute loads among servers. 
However, client-side proxy servers require modifications on Internet components that are beyond 
the control of many institutions that manage Web server systems. 

In the DNS based systems, by translating from a symbolic name to an IP address, the 
DNS can implement a large set of scheduling policies. The DNS approach is limited by the 

15 constraint of 32 Web servers for each public URL because of UDP packet size constraints 
although it can be scaled easily from LAN to WAN distributed systems. 

In the dispatcher-based approach, one single entity controls the routing decisions and 
implemented through a wide variety of algorithms. Dispatcher failure can disable the system. 
However, as a centralized controller, the dispatcher can achieve fine-grained load balancing. 

20 The server-based approach can be viewed as a combination of the DNS approach and the 

dispatcher approach. In the server-based approach, two levels of dispatching are used: (1) cluster 
DNS first assigns a client request to a Web server; and (2) each Web server may reassign the 
request to any other server of the cluster. It can achieve the fine-grained control on request 
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assignments as the dispatcher approach and reduces the impact of a central dispatcher 
failure, but redirection mechanisms typically increase the latency time perceived by the users. 

Only the Internet2 Distributed Storage Infrastructure Project (I2-DSI) proposes a "smart" 
DNS that uses network proximity information such as transmission delays in making routing 
5 decisions, as proposed by M. Beck, T. Moore, "The Internet2 Distributed Storage Infrastructure 
Project: An architecture for Internet content channels," Proc. Of 3 rd Workshop on WWW 
Caching, Manchester, England, June 1998. 

Traditionally, scheduling algorithms for distributed systems are not generally applicable 
to control Web server clusters because of the non-uniformity of load from different client 

10 domains, high variability of real Web workload, and a high degree of self-similarity in the Web 
requests. The Web server load information becomes obsolete quickly and is poorly correlated 
with future load conditions. Further, because the dynamics of the WWW involves high 
variability of domain and client workloads, exchange of information about the load condition of 
servers is not sufficient to provide scheduling decisions. What is needed is a real time adaptive 

15 mechanism that adapts rapidly to changing environment. However, none of the approaches 
incorporates any kind of intelligence or learning in routing of Web requests. 
Further, an any routing scheme, request turn around time (time to service the request) can be 
greatly decreased if the server chosen to respond to a request has the requested file in that 
server's cache memory. For instance, requests encrypted using Secure Socket Layer (SSL) use a 

20 session key to encrypt information passed between a client and a server. Since session keys are 
expensive to generate, each SSL request has a lifetime of about 100 seconds and requests 
between a specific client and server within the lifetime of the key use the same session key. So it 
is highly desirable to route the requests multiple requests from the same client to a server be 
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routed to the same server, as a different server may not know about the session key, and 

routing to the same server increases the probability that the prior request is still in the systems 

cache memory, further decreasing the time required to service the user request. One proposal that 

combines caching and server replication for client-side proxy servers is given by M. Baentsch, L. 

Baum, G. Molter, "Enhancing the Web's infrastructure: From caching to Replication," IEEE 

Internet Computing, Vol. 1, No. 2, pp. 18-27, Mar- Apr. 1997. However, a general scheme to 

increase the probability that the server chosen to service a particular request has the request page 

in cache is not presently available. 

(f) BRIEF SUMMARY OF THE INVENTION 
Summary of the Invention 

It is an object of the invention to provide a technique of servicing file requests on a web 

farm to increase the probability that the server selected to service the file request will have the 

requested file in cache. 

It is an object of the present invention to provide a routing system that reduces or 
eliminates the need for client side caching. 

It is an object of the invention to assist load balancing across the servers in a web farm. 

The invention is a system to route requests in a web farm through the use of a routing 
algorithm utilizing a neural network with at least two layers, an input layer and an out put layer. 
The input layer corresponds to the page identifiers P(j) and a function of the number of requests 
for that specific page R(P(j)) over a period of time. The outputs are the servers, S(i). A particular 
server S(K) is chosen to service a particular page request P(J) by minimizing (over i), using a 
suitable metric, the "distance" between R(P(J)) and w(i,J), where w(ij) is the set of weights 
connecting the input layer nodes to the output layer nodes. The neural weight w(J,K) is then 
updated, using a neighborhood function and a balancing function. The preferred update 
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neighborhood function is defined to be a gradient descent rule to a corresponding energy 
function. Heuristics to select parameters in the update rule that provide balance between hits and 
load balancing among servers are included. 

Simulations show an order of magnitude improvement over traditional DNS based load- 
balancing approaches. More specifically, performance of our algorithm ranged between 85% to 
98% hit rate compared to a performance range of 2% to 40% hit rate for a round robin scheme 
when simulating real Web traffic. As the traffic increases, our algorithm performs much better 
than the round robin scheme. A detailed experimental analysis is presented in this paper. 
(g) BRIEF DESCRIPTION OF THE DRA WINGS 
Bri e f De s cription s of the Figur es 

Figure 1 shows a schematic of a general web farm using a router to distribute requests to 
the servers in the web farm. 

Figure 2 shows a schematic depicting the general Kohonen network of an input layer, an 
output layer, and the weights connecting the two layers. 

Figure 3 shows a simplified Kohonen network. 

Figure 4a shows a cluster of web pages on a site. 

Figure 4b show the framework of the invention, routing requests through a neural 
network. 

Figure 5 is a flowchart showing implementation of one embodiment of the invention. 
(til DETAILED DESCRIPTION OF THE INVENTION 
Detailed Description of the Invention 

As used in this application, a Web server farm or a server cluster, refers to a Web site that 
uses two or more servers to service user requests. Typically, a single server can service user 
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requests for the files (such as pages) of a Web site, but larger Web sites may require 
multiple servers. The Web farm servers do not have to be physically located at a common site. A 
Web farm also refers to an ISP (internet service provider) that hosts sites across multiple servers, 
or that may store frequently requested pages across more than one server to reduce the time to 
5 service a user request for these pages. 

The servers in a web farm may have individual operating systems or a shared operating 
system and may also be set up to provide load balancing when traffic to the web site is high. In a 
server farm, if one server fails, another can act as backup. 

Web farms or clusters typically have a single machine or interface acting to distribute 
10 (dispatch) the file requests to servers in the farm. Such a single machine will be termed a proxy- 
server (proxy for the entire site), or a router. An example of such a system is shown in figure 1. 
Figure 1 shows an example of this type of system, where requests may come from various client 
sites 1 to the router 2, which then pools the requests and directs them to a specific server 3. Here 
the servers Sj...,S n each have their own cache memory 4 and may share a common file system 5. 
15 Correspondingly, each of these servers may have their individual storage. The router decides the 
allocation of web page request to individual servers, and then dispatches a particular request to a 
particular server. The router may be also be a server, which services particular requests. 

In large systems or sites, router tasks may be undertaken by a plurality of machines or 
routers, and may include an organizational structure to allocate tasks amongst the routers. For 
20 instance, certain pages may only be available from a sub-set or cluster of the overall servers on 
the web farm. Input to each cluster may be made simultaneously, with only the cluster storing the 
requested file responding to the request. Alternatively, input to a servicing cluster may be 
determined by a master distributing router, which then allocates the serving cluster based upon 
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some algorithm, such as the neural network algorithm described herein. Another way to 
view page clustering is to group "related" pages into a cluster, where the "relation" can be any 
predefined characteristic or characteristics, such as related content. In this instance, each cluster 
may have may have its own individual cluster gateway or router to distribute requests across the 
5 servers in the cluster. 

Each server in the farm (and can include the gateway router itself) typically will have 
certain files stored in cache memory. When the server receives a request for a file, if the server 
finds the page in cache, it returns it to the user (through the gateway or directly to the user) 
without needing to forward the request to server's main memory or shared server file storage. If 
10 the page is not in the cache, server main menory, or common server memory, the server, acting 
as a proxy server, can function as a client (or have the router function as a client) on behalf of the 
user, to use one of its own IP addresses to request the page from a server remote from the 
Webfarm. When the page is returned, the proxy server relates it to the original request and 
forwards it on to the user. 

15 In a proxy cache such as maintained by ISP's, clients request pages from a local server 

instead of directly from the source. The local server gets the page, saves it on disk and forwards 
it to the client. Subsequent requests from other clients get the cached copy, which is much faster 
(i.e. reduces latency time) and does not consume Internet bandwidth. 

A client is defined as a program that establishes connections to the Internet, whereas a 

20 Webserver stores information and serves client requests. A distributed Web-server system or 
web farm is any architecture of multiple Web servers that has some means of spreading the 
client requests to the farms 's servers. A session is an entire period of access from a single client 
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to a given Web site. A session may issue many HTML page or file requests. Typically a 
Web page consists of a collection of objects, and an object request requires an access to a server. 

The algorithm used in this invention is an aspect of competitive learning that will next be 
generally described. 
5 Competitive Learning — Background 
Competitive Learning - Background 

In the simplest competitive learning networks there is a single layer of output units 0\ y or 
output nodes, each is fully connected to a set of inputs Xj (input nodes) via connection weights wy 
10 (generally 3 0). A description of the algorithm follows. Such a system is shown in Figure 2. 

Let x be an input vector (with components Xj ) to a network of two layers with an 
associated set of weights Wy. The standard competitive learning rule is given by: 

Awj*j = r|(xj - Wj*j) 

r| being a scalar. This rule "moves" Wi* towards xj. The i* implies that only the set of weights 
15 corresponding to the winning nodes is updated. The winning node is taken to be the one with the 
largest output. Another way to write this is: 

Awij = r|Oi(x r wy), 




1 for i corresponding to the largest output 
0 otherwise 



20 This is the adaptive Kohonen approach. The usual definition of competitive learning requires a 
winner-take-all strategy. In many cases this requirement is relaxed to update all of the weights in 
proportion to some criterion, such as in a neighborhood of "winning" node. 

Kohonen' s Algorithms adjusts weights from common input nodes to N-output nodes 
arranged in a 2-dimensional grid shown in Figure 2, to form a vector quantizer. Input vectors are 
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presented sequentially in time and certain of the weights are modified according to the 
update rule chosen, and the neural network evolves or learns. Kohonen's algorithm organizes 
weights such that "close" nodes are sensitive to physically similar inputs. A detailed description 
of this algorithm follows. 

Let xi, X2 , xn be a set of input vector components, which defines a point in Tri- 
dimensional space. The output units Oi are arranged in an array and are fully connected to input 
via the weights wjj. A competitive learning rule is used to choose a "winning" weight vector Wj*, 
such that, for each j, 

| Wi* - Xj| <= |wjj - Xjl for all i, 
For instance, in the case of a two component vector x, (xi and X2) and three outputs, with six 
corresponding weights wy i=l,3; j=l,2 (fully connecting the input vector to the outputs), 
Kohonen's algorithm chooses the minimum of the following 3 "distances" (using the I2 norm): 
(xi-wn)**2 + (x2-wi 2 )**2 (and correspondingly updating wu and Wj 2 ); 
(xi-w 2 i)**2 + (x2-w 2 2)**2 (and correspondingly updating w21 and W22); or 
(xi-W3i)**2 + (x2-W32)**2 (and correspondingly updating W31 and W32) 
with the Kohonen's update rule generally given by: 

A Wi* = r| h(j, i*) (xj - Wi*i° ld ) for each j 
Here h(j ,i*) is a neighborhood function such that h(j,i*) = 1 if j = i* but falls off with distance |rj 
- rj*| between units j and i* in the output array. The winner and "close by" weights are updated 
appreciably more than those further away. A typical choice for h(j, i*) is: 

-( |r.-r *|/2a2) 
c j 1 5 

where a is a parameter that is gradually decreased to contract the neighborhood. r\ is decreased 
to ensure convergence. 
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The allocation rule used in the present invention is a modification of the traditional 
Kohonen Rule and will be described in a Web farm having N servers that service the requests for 
Web pages or files (files and pages are used interchangeably to identify a data set which is 
5 accessible through the site server/router or gateway via an identifying address) where the servers 
are identified as Sj...S N , as shown in Figure 3. 

As described, the Web-server farm is scalable and uses one URL to provide a single 
interface to users. For example, a single domain name may be associated with many IP addresses 
and each address may belong to a different Web server. The collection of Web servers is 
10 transparent to the users. In the current invention, the input vector to the input layer consists of a 
function of the page requests and the page identifier and the output layer consists of the server 
identification . 

Each request for a Web page is identified as a duplet <P i? Ri> where 1 < i < M, M being 
the number of requests (pages, objects or files) serviced by the Web farm at a predetermined 

15 time (if the farm is clustered, M could be the number of pages in the cluster, and the algorithm 
would be implemented by the cluster gateway or cluster router). P represents the page identifier, 
and R represents a function of the number of requests for that page over a predetermined period 
of time. The dispatching algorithm may deal only with a subset of the total number of pages of 
files in the Web farm (such as the most frequently accessed pages); how many pages to use in the 

20 algorithm is design decision, and in simulations, a range from 20 to 1000 was used. Requests are 
handled through the router (which may be a server or proxy-server in the web farm). The router 
will act as the dispatcher, routing a request either to a particular server, or a cluster router for 
further handling. 
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Figure 4 presents a conceptual framework of the proposed model. Initially, we treat the 
web site as a connected graph, as shown in figure 4a. Viewing the web site as a connected graph, 
5 such as in figure 4a, each node of the connected graph corresponds to a web page and links 
between the pages are the path between the nodes. This directed graph can be translated into a 
tree structure (shown in figure 4c) using some rule. For instance, a tree structure can be 
determined so that an in-order traversal of this tree would output web pages sorted in order of 
decreasing page requests. 

10 The collection of Web pages can be partitioned in to clusters in several ways. One 

approach is grouping the pages such that the average hit count is similar among different clusters 
to assist in load balancing between clusters. Another way is to group pages according to a 
relationship between page content, page links, or other interpage relationship to assist in 
reducing access latency time. One way to group pages into clusters is to partition the tree 

15 structure into left subtree and right subtree and allocate the subtrees to the clusters, such as 
shown in figure 4c, where 4 clusters are formed. Obviously, the way clusters are formed will 
have an effect on allocation of pages to servers, for hopefully, clustered pages will be found on 
the same server. For purposes of simplification and further discussion, assume that each page is 
its own cluster (that is, that a cluster has only one page associated with the page ids) and has an 

20 associated request count. 

In the simplified structure shown in figure 4b, we have one page per cluster, and the 
neural network's nputs layer's nodes are associated with the page identifiers, Pi. The vector input 
to the input layer, for a particular page Pk will be a vector where all entries are zero other than 
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the K th entry corresponding to the particular page request Pk, and the value for this vector 
component will be a function of the number of requests for that particular page Rk as measured 
from some pre-determined time. In essence, we now have the pages P and the associated request 
count R with the following one page per cluster structure 

{<Pi, Ri>} {<Pi, Ri>} {<Pm,R m >} 

that has to be mapped into the server farm. Initial allocation of clusters to the servers is based on 
some initial configuration of clusters. 
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The process now is to create connection to servers to "learn" the mapping from the pages to the 
1 5 server and to adapt to changes in the mapping. 

Mathematical Formulation Using Competitive Learning 
Mathematical Formulation U s ing Competitive Learning 

The model uses a two layer neural network: layer W and layer S. Each node in the input 
layer W corresponds to a page id, and the layer S corresponds to server ids. Define the weight 
20 wy as the connection "strength" from the page Pj, to server Sj. A pictorial representation of this 
architecture is given in Figure 3. 

Now we can formulate the problem of assigning web pages to the servers as a mapping 
for the placement of <Pj, Ri> € W onto a server space S as 

0 k : W -> S, such that Pi € W-> Sj e S; (i=l, M and j =1, N) 
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with the condition to allocate (classify) the pages (Pj) such that the pages are distributed 
substantially equally among the servers (Sj) to ensure equitable load among the servers and at the 

same time maximize the hits in the servers' cache in order to reduce latency and request service 
time. The objective is to optimize two things: (1) increase the number of hits, in the sense that a 
5 the server chosen to service the request has a high probability that the requested page is stored in 
cache, thereby accelerating the performance of the web site to allow fast response and fast 
loading of dynamic web pages; and (2) to distribute the page requests among the servers in such 
a fashion that the page requests are distributed equitably among the servers. 

The server Sk for a given page P m by using a modified Kohonen competitive learning 
10 selection is chosen as follows. Choose the server k such that 

|f(Rm)- w mk | = Min(dis | f(R m > w mj |) where j =1, N (1) 

Where "R m " is the number of requests for the given page over a predetermined period of time 
(this period could be a rolling period, and may include in the count the current request), "f " is 
some function of R, and "dis" is a distance measurement, as mesured by aome suitable metric, 
15 such as "absolute value" of the difference. 

For instance, with two pages Pi and P2 and three servers Si, S2, and S3, and weights Wy, 
i=l,2, j=l,3, this modified rule chooses the server k for page request Pi (with page request 
number Ri) as the minimum of the following three numbers, (using the function f (Ri) = R i) and 
the metric = absolute value, or l\ norm): 
20 abs(Ri -wi 1 ) and update wi 1 ; 

abs(Ri-wn) and update W12; or 
abs(Ri-wi 3 ) and update Wn 
As an alternative f(R j) can be a "normalized" request count, such as f(R i) = R \l EjR j. 
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Learning is achieved by updating the connection strength between page and the winning 
server using the general update rule: 

Awjk = neighborhood function + load balancing function. 
5 For instance, a preferred update rule is as follows: 

Aw ik = t) A(Ri, w ik , K) (Ri-Wft) + aK ((EW iT )- N W ik ) (2) 

The first term is the neighborhood function and the second term is a load balancing function. 
Here tj, a, and K are the parameters that determine the strength of controlling the maximum hit 
or balancing the load among the servers, and A(Ri, w ik , K) is given by 



10 



-g*g/(2*K*K) 

A(Ri, w ik , K) = here g = (R r W ik ), H>(d,K) = Y e -~w*> 
andd= (Rj-WyXj^LN (3) 

Integrating equation (2) and after some algebraic manipulations, an energy function is obtained 

E = r)KlnZ m , k e- dttl/2K * K +a Z**(Wy./ + w ty+J -4w t y+w W j + w ty ) 2 (4) 

dE 

Since the update rule given in equation (2) is of the form the update rule is a 

15 gradient descent rule for the energy function given in equation (4). 

Again, as can be seen by a close examination of equation 2, the neighborhood function, r| 
A(Ri, Wj k , K) (Ri-Wi k ), tends to drive the updated weight (i.e. w* + Awj k ) toward the request 
count. The load balancing function, aK ((ZW h )- N Wj k ), tends to drive the updated weight 
toward the average weight, that is, to equalize the weights (This particular load balancing 

20 function can be re- written as (aKN((ZWi x )/N- Wj k ) or (aKN(average weight - )). 
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Note, in the given example of a single page per cluster, since only a single weight will be 
updated, the neighborhood function can be a constant. If the pages are clustered, then it may be 
appropriate to use a true neighborhood function, updating those weights directed to all servers in 
the cluster. 

Heuristics on the Selection of Parameters 
Heuri s tics on the Selection of Paramet e rs 

Ideally, the neighborhood function A(Ri, Wjk, K) is 1 for i=k and falls off with the 

distance |Rj-wy| In general, the neighborhood is selected such that servers having related pages 
10 are "closer" in a neighborhood (furthering the likelihood of finding pages in cache). The first 
part on the right hand side of equation (2) pushes the selected weight w,k, toward the request 
count Ric thereby increasing the probability that page requests for pages that are in server Si's 
cache will be directed to the server Sj , (as then dis(wik, - Rk) should be minimum. The second 
term on the right of equation (2) increases the likelihood that no one server will get overloaded, 
15 that is, the page requests are distributed evenly among the servers. By a proper balance of the 
parameters r\, a, and K; we can direct the flow of traffic in an optimal fashion. 

r|, a, and K are related as follows rj oc — !— Higher r\ and lower ccK mean we stress page 

aK 

hits are emphasized over load balancing, while higher aK means more weight is given to load 
balancing.. Putting <x=0 would mean increasing web page hits without regard for load balancing. 
20 In simulations, it has been found that a high hits are maintained using small values of a and still 
have reasonable load balancing among servers. 
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Simulation U s ing the Method for One Pag e On e Clust e r 

Simulation Using the Method for One Page One Cluster 

5 An outline of the method implemented for simulation with one page per cluster follows 

and is flowcharted on Figure 5: 

1) Initialize M y N; 

2) Initialize with random values the weights {wijj between the page requests and the 
10 servers and select parameters t] and a. 

3) While (there are no more page requests) 

//begin while// 

3.1) { Calculate \ (Ri / £R j ) - vt>/* | and select the K which minimizes this value 

32) Determine whether the selected server is a "hit" or "miss". A hit is 
15 counted if the selected server was that server that serviced the previous request for 

this particular page. The assumption being that the server servicing the previous 

request is more likely to have the page still in cache memory that other servers. A 
"miss: is counted if the server selected does not correspond to the previously 
servicing server 

20 3.3) Update the server selection for this page request to correspond to the 

server chosen. 
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3.4) Update the weight using Aw ik = tj(R r W ik ) + a((ZW ix }'NW ik ) 

Simulation Results 

Simulation Results 

5 The characteristics of Web traffic and the self similarity inherent in real Web traffic can 

be simulated by modeling the traffic through a heavy tailed distribution of the type P[X > x} ~ x" 
a as x-»oo for 0 < a < 2. The results correspond to Pareto distribution, with probability density 
function p(x) = akV*' 1 , where a = 0.9 and k = 0.1. Data sets corresponding to use of the Pareto 
distribution for page requests are referred to as "Non-Uniform." For purposes of simulation, 

10 Web traffic was also simulated using a uniform probability distribution for the page requests, 
that is, each page is equally likely to be requested. 

The neural network algorithm was compared using simulations to a Round Robin (RR), 
Round Robin 2 (RR2), and a special case of Adaptive TTL algorithm. In RR2 algorithm, a Web 
cluster is partitioned into two classes: Normal domains and Hot domains. This partition is based 

15 on domain load information. In this strategy, Round Robin scheme is applied separately to each 
of the domains. In the implementation of Adaptive TTL algorithm, a lower TTL value was 
assigned when a request is originated from Hot domains and a higher TTL value is assigned 
when it originates from Normal domain, this way the skew on Web pages is reduced. 

The simulations were run using a variety of values for the update parameters r| and a. 

20 For instance, r\ varied between .2 and .8, while a varied with the number of servers, as 
l/#servers. In all cases, the results using the Neural Network implementation were similar 
showing a high initial hit ratio and converging on a hit ration of 1 as the number of requests 
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increased. Because the variances in the results are minor, specific graphs for the various 



parameter values are not shown. The following table gives characteristics of the simulations. 



Sample Size 


Number of Web pages ranged from 150 to 
1050 and the statistics were collected at the 
intervals of 50 pages each. 


Number of Servers 


Statistics were collected for 4, 8, 16, 32 
servers 


Web Pages Distribution Used 


Uniform and Non Uniform (Pareto) 


Algorithms 


Neural Network (NN) with r| varing 

between .2 and .8 and a varying with the 

number of servers as l/#servers; 

Round Robin 

Round Robin 2 (RR), 

and Adaptive Time-to-Live 



Table 1 Simulation Characteristics 

5 

The comparison charts in the following discussions relate only to Round Robin scheme 
and the Neural Net based algorithm. The results (hit ratios) for adaptive TTL algorithm varied 
widely for different input size of Web pages and for different input page distributions, but never 
ranged higher than 0.68. In these tables, "Hit Ratio" corresponds to the following ration, where 
10 Hit = number of page requests to the "proper server"; and Miss = number of page requests to the 
"improper server"; Hit Ratio = Hit/(Hit + Miss): 
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Graph 1 Performance of page placement algorithm using competitive learning (Neural 
Network) versus Round Robin Algorithms (for Non-Uniform Input Data Distribution) 
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45 Graph 2 Performance of page placement algorithm using competitive learning (Neural 
Network) versus Round Robin Algorithms (for Uniform Input Data Distribution) 
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As can be seen from Graph 1, the Neural Network (NN) competitive learning algorithm performs 
much better as compared to Round Robin schemes (also RR2, not shown) when input pages 
follow a Pareto distribution. As the number of input pages increase, the the algorithm achieves a 
5 hit ratio close to 0.98, whereas the round robin schemes never achieved a hit ratio of more than 
0.4. 

For the neural network algorithm, hit ratios (0.86) with a smaller number of pages is 
attributed to some learning on the part of the algorithm, but as the algorithm learns, the hit ratio 
asymptotically stabilizes to 0.98 for larger number of pages. 
10 For uniform distribution of input pages, NN algorithm performs similarly as for non- 

uniform distribution and is much better than the Round Robin schemes (See Graph 2). 





RR 


NN 


Uniform 


(0.31,150,4) 


(0.98,1050,4) 


Non-Uniform 


(0.32,150,4) 


(0.98,1050,4) 



Table 2 Comparison of maximum hit ratio achieved, input size, and servers 

15 

Round Robin scheme never achieves a hit ratio higher than 0.32, where as NN achieves hit ratios 
close to 0.98 (See Table 2). 





RR 


NN 


Uniform 


(0.03,1050,32) 


(0.85,150,32) 


Non-Uniform 


(0.02,1050,32) 


(0.86,150,32) 



20 Table 3 Comparison of minimum hit ratio achieved, input size, and servers 

As a worst case, NN achieves a hit ratio of as high as 0.85 for 32 servers, where as RR schemes 
go as low as 0.02 hit ratio (See Table 3). 
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Conclusions 

Conclusions 

An analysis indicates the following results: 

(1) The performance of the NN algorithm increases considerably (from 0.85 hit rate 
to 0.98 as compared to 0.02 to 0.38 for Round Robin scheme) as the traffic increases 
where as the performance of Round Robin decreases. This result holds true irrespective 
of the number of servers. This is a result of a push of a page towards the same server 
based on the learning component in equation (2). 

(2) For uniform distribution of Web page requests and at a lower traffic rate with 
large number of servers (16 and 32), both the algorithms performance are acceptable. As 
the traffic increases the NN algorithm performs much better than the RR scheme. 

(3) For a non-uniform distribution (Pareto distribution), the NN algorithm performs 
considerably better for lower and higher traffic rates and the performance irrespective of 
the number of servers. 

For Pareto distribution, which closely models real Web traffic, better performance of the NN 
algorithm, at larger input rate of Web pages is a very attractive result. 



{B0178323.3} 



21 



HTTP 

Server 



Cache 



X 




o o 




Router 



-o 



HTTP 
Server 



S 1 



T 



HTTP 
Server 



Cache 



T 



HTTP 
Serv 



Cache 



S 4 



10 




common file system or 
individual storage of each server 



HTTP 
Server 

Cache 



Figure 1 



{B0178323.3} 



22 





{B0178323.3) 



23 




25 



Figure 3 
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FIGURE 5 



hit[j] = hit[j] + l 



NEURAL NETWORK 
ALGORITHM 



Generate initial weights W[i][j] 
where i = 0 to NS -1 and 
j=0to NDP-1 




For each input page id 
PID 




total_page_count = 1 
count[PID] = 
hitratio = count[PIE 


total_page_count + 1 

count[PID] + 1 

>] / total_page_count 






Find J such that it minimizes 
abs(hitratio-W[j][PID]) 
where j - 0 to NS -1 




sumvv =Z\V[x][y] 
where x = 0 to NS - 1 and 
v = 0 to NDP-1 



1 


r 


miss[PSMap[PID]] = miss [PSMap [PID]] + 1 
PSMap[PID]=j 


« 2 


r 



W[j][PID] = W[j][PID] + ti * (hitratio - W[j][PID]) + y * (sumw - W]j][PID]) 



i r 

O 



STOP 
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